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A  NOTE  ON  AN  LQG  REGULATOR  WITH  MARKOVIAN 
SWITCHING  AND  PATHWISE  AVERAGE  COST* 


Mrinal  K.  GHOSHf,  Aristotle  Arapostathis}  and  Steven  I.  Marcus§ 


Abstract.  We  study  a  linear  system  with  a  Markovian  switching  parameter  perturbed  by 
white  noise.  The  cost  function  is  quadratic.  Under  certain  conditions,  we  find  a  linear  feedback 
control  which  is  almost  surely  optimal  for  the  pathwise  average  cost  over  the  infinite  planning 
horizon. 


1.  Introduction.  We  study  a  parameterized  linear  system  perturbed  by  white  noise.  The 
parameters  are  randomly  switching  from  one  state  to  the  other  and  are  modelled  as  a  finite 
state  Markov  chain;  the  values  of  the  parameter  and  the  state  of  the  linear  system  are 
assumed  to  be  known  to  the  controller.  The  objective  is  to  minimize  a  quadratic  cost  over 
the  infinite  planning  horizon.  Such  dynamics  arise  quite  often  in  numerous  applications 
involving  systems  with  multiple  modes  or  failure  modes,  such  as  fault  tolerant  control 
systems,  multiple  target  tracking,  flexible  manufacturing  systems,  etc.  [3],  [4],  [9]. 

For  a  finite  planning  horizon,  the  problem  is  well  understood,  but  difficulties  arise  when 
the  planning  horizon  is  infinite  (very  large  in  practice)  and  one  looks  for  a  steady  state 
solution.  Due  to  constant  perturbation  by  the  white  noise,  the  total  cost  is  usually  infinite, 
rendering  the  total  cost  criterion  inappropriate  for  measuring  performance.  In  this  situation, 
one  studies  the  (long-run)  average  cost  criterion.  Here  we  study  the  optimal  control  problem 
of  such  a  system  with  pathwise  average  cost.  Pathwise  results  are  very  important  in  practical 
applications,  since  we  often  deal  with  a  single  realization.  This  problem  for  a  very  general 
hybrid  system  has  been  studied  in  [5],  where  we  have  established  the  existence  of  an  almost 
surely  optimal  stationary  Markov  control  and  have  characterized  it  as  a  minimizing  selector 
of  the  Hamiltonian  associated  with  the  corresponding  dynamic  programming  equations. 
When  specializing  to  the  LQG  case,  the  existence  results  in  [5]  carry  through  with  minor 
modifications,  but  the  characterization  results  do  not,  since  the  boundedness  of  the  drift 
is  crucially  used  to  derive  the  dynamic  programming  equations.  In  this  note  we  sketch 
the  derivation  of  the  existence  result  for  the  LQG  problem  from  [5].  We  then  characterize 
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the  optimal  control  by  solving  the  dynamic  programming  equations  via  Riccati  equations. 
Similar  dynamics,  but  with  an  overtaking  optimality  criterion,  have  been  recently  studied 
in  [6]. 

Our  paper  is  structured  as  follows.  Section  2  deals  with  the  problem  description.  The 
main  results  are  contained  in  Section  3.  Section  4  concludes  the  paper  with  some  remarks. 

2.  Problem  Description.  For  the  sake  of  notational  convenience,  we  treat  the  scalar 
case.  The  higher  dimensional  case  can  be  treated  in  an  analogous  manner.  Let  S(t)  be  a 
(continuous  time)  Markov  chain  taking  values  in  S  —  {1, 2, ... ,  N]  with  generator  A  =  [Ajj] 
such  that  Xij  >  0,  i  ^  j  (this  condition  can  be  relaxed.  See  Remark  3.1(v).).  Let  X(t )  be 
a  one-dimensional  process  given  by 

dX(t)  =  [ A(S(t))X(t )  +  B{S(t))u{t)]dt  +  o(S(t))dW{t) 

{2A)  X(0)  =  x„ 

for  t  >  0,  where  IF(-)  is  a  one-dimensional  standard  Brownian  motion  independent  of  S{) 
and  X0.  u(t )  is  a  real  valued  nonanticipative  process  satisfying 

[T 

(2.2)  /  u2(t)dt  <  oo  a.s.  (almost  surely) 

J  o 

for  each  T  >  0,  A(i ),  B(i),  a(i)  are  scalars  such  that  B(i)  /  0  and  a(i)  >  0  for  each  i. 
We  will  often  write  Ai,  Bi,  etc.  instead  of  A(i),  B(i).  The  nonanticipative  process  u(t) 
satisfying  (2.2)  is  called  an  admissible  control.  R  is  called  a  stationary  Markov  control  if 
u(t )  =  v(X(t),  S(t,))  for  a  measurable  map  v  :  R  x  S  — >•  R.  With  an  abuse  of  notation,  the 
map  v  itself  is  called  a  stationary  Markov  control.  A  stationary  Markov  control  v(x,  i )  is 
called  a  linear  feedback  control  if  v(x,i)  =  kiX,  where  ki  are  scalars,  i  =  1, . . .  ,N. 

Under  a  stationary  Markov  control  v,  the  hybrid  process  (X(t),S(l))  is  a  strong  (time- 
homogeneous)  Markov  process  [4],  Let  v  be  a  stationary  Markov  control.  Under  v,  the 
point  (x,  i)  €  R  x  <S  is  said  to  be  recurrent  under  v  if 

Pfti(X(tn)  =  x,  S(tn )  =  i,  for  some  sequence  tn  |  oo)  =  1 , 

where  is  the  measure  under  v  and  with  initial  condition  X(0)  =  x,  5(0)  =  i,  in  (2.1). 
A  point(x,  i)  is  transient  under  v,  if 

Px,i(\x(t)\  -*•  °o>  as  t  — >■  oo)  =  1 . 

For  a  space  of  dimension  d  >  1,  the  point  ( x,i )  e  Rd  x  S,  is  said  to  be  recurrent  under  v, 
if  given  any  e  >  0, 

Pf  i(\\X(tn)  -  x||  <  e,  S(tn )  =  i,  for  some  sequence  tn  T  oo)  =  1 . 

If  all  points  ( x,i )  are  recurrent,  then  the  hybrid  process  (X(t),  S(t))  is  called  recurrent.  It 
is  shown  in  [5]  that  under  our  assumption,  for  any  stationary  Markov  control,  (X(t),  S(t)) 
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is  either  recurrent  or  transient.  A  recurrent  ( X(t),S(t ))  will  admit  a  unique  (up  to  a 
constant  multiple)  a-finite  invariant  measure  on  R  x S  (resp.  R d  xS  for  higher  dimension). 
The  hybrid  process  (X(t),S(t))  is  called  positive  recurrent  if  it  is  recurrent  and  admits 
a  finite  invariant  measure.  For  other  details  concerning  recurrence  and  ergodicity  of  this 
class  of  hybrid  systems  we  refer  the  reader  to  [5].  A  stationary  Markov  control  v  is  called 
stable  if  the  corresponding  process  (X(t),S(t))  is  positive  recurrent.  The  cost  function 
c(x,  i,  u)  :  R  x  S  x  R  — ►  R+  is  given  by 

(2.3)  c(x,  i,  u )  =  C(i)x 2  +  D(i)u 2, 

where  Cj  >  0,  A  >  0  for  each  i.  We  say  that  an  admissible  policy  u(-)  is  a.s.  optimal  if 
there  exists  a  constant  p*  such  that 

(2.4)  limsupi  [T[C(S(t))X2(t)  +  D(S(t))u2(t)]dt  =  p*  Pu  a.s., 

T  — too  *  JO 

where  X(t)  is  the  solution  of  (2.1)  under  «.(•),  and  for  any  other  admissible  policy  u(-) 

(2.5)  lim  sup  ^  f  [C(S(t))X2(t)  +  D(S(t))u2(t)]dt  >  p*  P“  a.s., 

where  X(t)  is  the  solution  of  (2.1)  under  ?/(•),  and  initial  condition  A(0)  =  X0.  Note  that 
in  (2.4)  and  (2.5)  the  two  measures  with  respect  to  which  the  ‘a.s.’  qualifies  may  be  defined 
on  different  measurable  spaces.  Our  goal  is  to  show  the  existence  of  an  a.s.  optimal  control 
and  then  find  an  a.s.  optimal  linear  feedback  control. 


3.  Main  Results.  We  first  note  that  the  set  of  stable  Markov  policies  is  nonempty.  The 
proof  of  this  claim  is  rather  standard  and  relies  on  a  Lyapunov  technique  (see  [10]  for  more 
general  results). 

For  a  stable  stationary  Markov  control  v,  there  exists  a  unique  invariant  probability 
measure,  denoted  by  r]v,  for  the  corresponding  process  (X(t),S(t)),  and 

(3.2)  A,:=  lim  \  f  [C{S(t))X2(t)  +  D(S(t))v2(X(t),S(t))]dt 

I  — KX>  1  J  0 


T:  /  (CiX2  +  DiV2(x,i))rjv(dx,i)  a.s. 

i=  1 


If  v,  as  above,  is  a  stable  linear  feedback,  then  it  can  be  shown  as  in  [11]  that  pv  <  oo.  Let 


(3.3)  p*  =  inf  pv 

V 

where  the  infimum  is  over  all  stable  linear  feedback  controls.  Clearly,  we  look  for  a  stable 
linear  feedback  control  v*  such  that  p*  =  pv*  and  v*  is  a.s.  optimal  among  all  admissible 
controls.  Since  (7,  >  0,  the  cost  function  c  in  (2.3)  penalizes  the  unstable  behavior.  In  other 
words,  the  cost  penalizes  the  drift  of  the  process  away  from  some  compact  set,  requiring 
the  optimal  control  to  exert  some  kind  of  a  “centripetal  force”  pushing  the  process  back 
towards  this  compact  set.  Thus,  the  optimal  control  gains  the  desired  stability  property.  In 
the  framework  of  [5],  it  is  easily  seen  that  the  penalizing  condition  (A5) 

lim  inf  inf  c(x,  i,  u)  >  p* 

|x|  — KX)  U 

is  satisfied.  Thus  by  the  results  of  [5],  we  have  the  following  existence  result. 
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Theorem  3.1.  There  exists  a  stable  stationary  Markov  control  which  is  a.s.  optimal. 

We  now  proceed  to  find  a  stable  linear  feedback  control  which  is  a.s.  optimal.  To  this 
end,  we  study  the  dynamic  programming  equation  given  by 

1  N 
(3-4)  -<r2V"(x,i)  +  min [ +  BiU)V'(x,  i)  +  C(i) x2  +  D(i)u2]  +  ^  Ay  V(a :,j)  =  p, 

0=1 

where  p  is  a  scalar  and  V  :  R  x  S  — ►  M.  Since  the  drift  is  unbounded,  the  dynamic 
programming  treatment  of  [5]  is  not  applicable  here.  We  look  for  a  trial  solution  of  (3.4), 
of  the  form 


(3.5) 


V(x,i)  -  QiX2  +  Ri, 


where  Qi  and  Ri  are  to  be  determined.  Following  the  usual  procedure,  we  find  that  the 
minimizing  selector  in  (3.4)  is  given  by 


/o  c\  /  BiQiX 

(3.6)  v(x,i)  = — 

where  the  Qi’s  are  the  unique  positive  solution  of  the  algebraic  Riccati  system 

R2D2  n 

(3-7)  2AiQi--^-  +  Ci  +  J2XuQj  =0 


and  the  Ri  s  are  given  by 


N 

(3-8)  c r2Qi  +  A  ijRj  =  p. 

3=1 

Note  that  (3.8)  is  an  underdetermined  system  of  equations  in  /?.*,  p,  i  =  1, 2, . . . ,  N.  Also, 
if  the  Ri  s  satisfy  (3.8),  then  so  do  Ri  +k  for  any  constant  k. 

Lemma  3.1.  Fix  an  i0  €  S.  Then  there  exists  a  unique  solution  (Rt,  p)  to  (3.8)  satisfying 
Ri0=o. 


Proof.  We  have  for  any  T  >  0, 


E  R(S(T))  -  E  R{S(0))  =E 


(3.9) 


A s(t),jRj  dt 


rT  N 

I. Sr 

PT~E[Jo  °2 (S(t))Q(S(t))dt 


where  the  second  equality  follows  from  (3.8).  Dividing  (3.9)  by  T,  letting  T 
using  the  fact  that  the  chain  S(t)  is  irreducible  and  ergodic,  we  have 


oo,  and 


p  —  lim  —  E 

H  T— >oo  T 


(3.10) 


[  a2(S(t))Q(S(t))dt 
Jo 

l  fr 

lim  —  /  o2(S(t))Q(S(t))dt  a.s. 

N 


T 

N 


i=  1 
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where  7r  —  [7r(l), . . .  ,7r(Ar)]/  is  the  (unique)  invariant  probability  measure  of  S(t).  Let  Ri 
be  defined  by 


(3.11) 


{a2(S(t))Q{S(t))-p)dt\S(0) 


i 


where  rio  =  inf  {f  >  0  |  S(t)  =  i0}.  Then,  using  Dynkin’s  formula,  it  is  seen  that  Ri  satisfies 
(3.8)  with  Rio  —  0.  Let  (f?',p')  be  another  solution  to  (3.8)  such  that  R'io  =  0.  As  before, 
we  have 

N 

P'  =  p. 

i=l 

Let  M(t )  =  R(S(t))  —  R'^S^t)).  Then  using  (3.8),  M(t )  is  easily  seen  to  be  a  martingale 
which  converges  a.s.  Since  S(t )  is  positive  recurrent,  it  visits  every  state  infinitely  often. 
Hence  Rt  —  R\  must  be  a  constant.  Thus  Rt  —  !{[  =  Rio  —  R'io  =0.  □ 

In  view  of  these  results,  using  Ito’s  formula  and  the  pathwise  analysis  of  [5],  the  following 
result  is  now  apparent. 

Theorem  3.2.  For  the  LQG  problem  (2.1)-(2.3),  the  linear  feedback  control  given  by  (3.6) 
is  a.s.  optimal,  where  the  Qi’s  are  determined  by  (3.8).  The  pathwise  optimal  cost  is  given 
by  (3.9). 

Some  comments  are  in  order  now. 


Remark  3.1. 

(i)  The  condition  Bi  0  for  each  i  can  be  relaxed.  If  for  some  i,  Bi  =  0,  then  the 
above  result  will  still  hold  if  ( Ai ,  Bt)  are  stochastically  stabilizable  in  a  certain  sense 

[7],  [9]- 

(ii)  For  the  multidimensional  case,  if  Bi,  Ci,  Oi  are  positive  definite,  then  all  the  above 
results  will  hold.  But  the  positive  definiteness  of  Bi  is  a  very  strong  condition,  since 
in  many  cases  the  Bfs  may  not  even  be  square  matrices.  In  such  a  case  if  we  assume 
that  ( Ai,Bi )  are  stochastically  stabilizable,  then  the  above  results  will  again  hold. 
Sufficient  conditions  for  stochastic  stabilizability  are  given  in  [2],  [7],  [9].  If  Ci  is  not 
positive  definite,  then  the  cost  does  not  necessarily  penalize  the  unstable  behavior, 
as  discussed  in  the  foregoing.  Thus  the  condition  (A5)  of  [5]  is  not  satisfied.  In 
this  case  under  a  further  detectability  condition,  the  optimality  can  be  obtained  in 
a  restricted  class  of  stationary  Markov  controls  [2]. 

(iii)  Let  p  be  as  in  (3.9).  Then  for  any  admissible  policy  u(t)  it  can  be  shown  by  the 
pathwise  analysis  in  [5]  that 

lhninfi  f  [C(S(t))X2(t)  +  D(S(t))u2(t)]dt  >  p  a.s. 

T-*oo  1  J  0 

This  establishes  the  optimality  of  the  linear  feedback  control  v  (3.6)  in  a  much 
stronger  sense,  viz.  the  most  “pessimistic”  pathwise  average  cost  under  v  is  no  worse 
than  the  most  “optimistic”  pathwise  average  cost  under  any  admissible  control. 
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(iv)  For  T  >  0,  let  V(x,i,T)  denote  the  optimal  expected  cost  for  the  finite  horizon 
[0,  T ] .  Then  it  can  be  shown  as  in  [2]  that 

lim  ^V(x,i,T)  =  p 

where  p  is  as  in  (3.9).  Thus  the  finite  horizon  value  function  approaches  the  optimal 
pathwise  average  cost  as  the  length  of  the  horizon  increases  to  infinity.  Thus  for 
large  T,  the  linear  feedback  control  (3.6)  would  be  a  reasonably  good  nearly  optimal 
control  for  the  finite  horizon  case.  This  would  be  particularly  useful  in  practical  ap¬ 
plications  since  it  is  computationally  more  economical  to  solve  the  algebraic  Riccati 
system  than  the  Riccati  system  of  differential  equations. 

(v)  The  condition  A ij  >  0  can  be  relaxed  to  the  condition  that  the  chain  S(t )  is  irre¬ 
ducible  (and  hence  ergodic).  The  existence  part  in  [5]  can  be  suitably  modified  to 
make  the  necessary  claim  here.  In  the  dynamic  programming  part,  the  existence  of 
a  unique  solution  in  Lemma  3.1  is  clearly  true  under  the  irreducibility  condition. 

4.  Conclusions.  In  this  note,  we  have  studied  the  pathwise  optimality  of  an  LQG  regulator 
with  Markovian  switching  parameters.  We  have  assumed  that  the  Markovian  parameters 
are  known  to  the  controllers.  This  is  an  ideal  situation.  In  practice  the  controllers  may 
not  have  a  complete  knowledge  of  these  parameters.  In  this  case,  one  usually  studies  the 
corresponding  minimum  variance  filter.  Unfortunately,  this  filter  is  almost  always  infinite 
dimensional  [9].  A  computationally  efficient  suboptimal  filter  has  been  developed  in  [1],  [8]. 
We  hope  that  our  results  will  be  useful  in  the  dual  control  problem  arising  in  this  situation. 
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